Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines #25300

Merged
merged 7 commits into from
Dec 30, 2024

Conversation

dmatej
Copy link
Contributor

@dmatej dmatej commented Dec 28, 2024

Fixes #25295 and #25292
Replaces #25293

There were several issues, see individual commits. The main problem was that the startup can be faster than shutdown and then they could collide on ports and files. The most problematic was the debug port which is enabled since the JVM startup until the very end.

On my new machine it was reproducible in some 80% of executions.

Solution for #25292:

  • Instead of busy spinning on remote port we now open the connection and wait until connection is disconnected.

For #25295 was needed also

  • Move startup to shutdown hooks
  • The startup hook waits for the end of other glassfish shutdown hooks (detected by name)
  • Logging is explicitly stopped
  • For extreme cases I added additional logging for dying and starting process which can be enabled by setting an environment option export AS_RESTART_LOGFILES=true;

Note

  • See https://bugs.openjdk.org/browse/JDK-8284282 - applies to our Jenkins and most docker containers too. Terminated GlassFish instances become zombies, then handle#isAlive returns true and onExit.get is still blocked.

- The start succeeded too early and on fast machines collided with shutdown.
- Shutdown Hook is really the last thing in the JVM capable of doing it.
- All shutdown hooks have names now

Signed-off-by: David Matějček <[email protected]>
…cases

- when current (old) JVM had enabled debugging, the new one sometimes failed
  to start. It is not possible to wait from the inside.
- Stop the kernell after adding the last shutdown hook; shutdown hooks run
  in parallel, but we have to ensure that ours will be executed after all
  other non-daemon hooks finish.
- export AS_RESTART_LOGFILES=true to get "old" and "new" files in the server's
  log directory. It is trivial workaround, because the standard logging system
  might get into a conflict with the new GF instance too.
- The "super debug" is not helpful as it affects timing

Signed-off-by: David Matějček <[email protected]>
- its only usage was for the domain restart which was reimplemented

Signed-off-by: David Matějček <[email protected]>
@dmatej dmatej added the bug Something isn't working label Dec 28, 2024
@dmatej dmatej added this to the 7.0.21 milestone Dec 28, 2024
@dmatej dmatej requested review from avpinchuk and a team December 28, 2024 16:15
@dmatej dmatej self-assigned this Dec 28, 2024
- backup of the server.log cannot be done if the server is dead
- Using System.Logger instead of JUL

Signed-off-by: David Matějček <[email protected]>
- Original code caused local port exhaustion
- Original code used busy spinning instead of signals

Signed-off-by: David Matějček <[email protected]>
@dmatej dmatej force-pushed the fix-restart-on-fast-machines branch 14 times, most recently from 24b4717 to 3c040b2 Compare December 29, 2024 15:59
@dmatej
Copy link
Contributor Author

dmatej commented Dec 29, 2024

Heuréka! And now I see why we had that weird code checking of info() ... Jenkins/k8s/docker doesn't reap zombies and then onExit().get() blocks forever.
https://bugs.openjdk.org/browse/JDK-8284282?jql=status%20in%20(Closed%2C%20Submitted)%20AND%20text%20~%20"ProcessHandle"

@dmatej dmatej linked an issue Dec 29, 2024 that may be closed by this pull request
- Reverted usage of ProcessHandle.onExit.get as it doesn't work in containers
  which don't have strict reaper. The zombie project is considered as alive
  and get then hangs forever.
- Added waitpid, however it is not installed everywhere - if it is missing,
  we sleep for 1 second instead. That should be enough so the operating system
  could do the cleanup.

Signed-off-by: David Matějček <[email protected]>
@dmatej dmatej force-pushed the fix-restart-on-fast-machines branch from 3c040b2 to 190fd0b Compare December 29, 2024 16:12
@dmatej dmatej marked this pull request as ready for review December 29, 2024 17:42
@OndroMih OndroMih changed the title Fix restart on fast machines Fix creating a lot of ephemeral ports when stopping GlassFish. Fix restart on fast machines Dec 29, 2024
@OndroMih
Copy link
Contributor

I renamed this PR so that it reflects all things that were fixed. The title will appear in the release notes so it's good if it clearly describes all that this PR adds/fixes.

@arjantijms arjantijms merged commit 3252cc5 into eclipse-ee4j:master Dec 30, 2024
3 checks passed
@dmatej dmatej deleted the fix-restart-on-fast-machines branch December 30, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants